Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Store object name separately for symbols #13113

Merged
merged 1 commit into from
Dec 7, 2024

Conversation

CelestialAmber
Copy link
Contributor

This PR adds code that checks for extra information following symbol names that is often present in CodeWarrior symbol maps, namely the names of the .a/.o files that symbol belongs to. If present, the symbol name string is truncated to remove the extra text from the symbol name.

@JMC47
Copy link
Contributor

JMC47 commented Oct 10, 2024

@dolphin-emu-bot rebuild

@CelestialAmber CelestialAmber marked this pull request as draft October 10, 2024 04:51
@CelestialAmber CelestialAmber changed the title PPCSymbolDB: remove extra info from symbol names if present [DRAFT] PPCSymbolDB: remove extra info from symbol names if present Oct 10, 2024
@CelestialAmber
Copy link
Contributor Author

I'm changing this to draft as I think it would be better to also include code to handle the file name instead of just discarding it. I would like to also update the symbol view accordingly to still show file names but that might be better as a separate PR so I'm open to discussions on that.

@CelestialAmber CelestialAmber marked this pull request as ready for review October 10, 2024 20:52
@CelestialAmber
Copy link
Contributor Author

Ok, ready for review again.

@CelestialAmber CelestialAmber changed the title [DRAFT] PPCSymbolDB: remove extra info from symbol names if present PPCSymbolDB: Store symbol object names from loaded symbol maps separately Oct 10, 2024
@sepalani
Copy link
Contributor

I suspect that this change can break other symbol map formats. Moreover, the object might not necessarily be a .o object file.

IIRC, the stripped symbol name is already available through the Symbol::function_name member. What I could suggest you instead is to provide a GUI toggle that can switch between displaying the stripped function name or the whole name including everything that's following it. It might be useful when working with relocated symbols where some symbols have the same name but different address and object.

@CelestialAmber
Copy link
Contributor Author

Those are fair concerns; I do recall some map files having .c, .s, etc at the end. However, it would be very easy to remedy that issue - all that would need to be done is just to pick the last part of the split string, as it will always be the object file. I would be willing to test it against various map formats if so desired; I think that it would be a shame to just settle for a GUI toggle instead of being able to develop on top of this, perhaps adding the option to filter by object file (something I also want to try in the future if this is accepted). I feel that the debugging features in Dolphin have a lot of room for improvement and while I understand that this definitely needs to be made more robust, I would be willing to put in the necessary work.

Copy link
Contributor

@sepalani sepalani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes are going way beyond just adding the object name properties. You're changing the way symbol maps are loaded and saved.

I couldn't test it because it doesn't compile but I highly suspect that regular expressions might have a performance impact compared to the original code. The way you're matching for the object name might also be too strict.

IMO, I consider changing the name value not fine, as it forces you to change the logic at so many places. For instance, all those ternary could have been avoided.

In sum, I'd advise you to add an object_name member as a first PR. Then, build features on top of it with follow-up PRs.

Source/Core/Core/Boot/ElfReader.cpp Outdated Show resolved Hide resolved
Source/Core/Core/Debugger/RSO.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCSymbolDB.h Outdated Show resolved Hide resolved
@CelestialAmber
Copy link
Contributor Author

Given how large the PR has grown, I suppose it is reasonable that only the object name split changes are included in this PR, and the rest can be considered after.

That being said, I did timing tests, and this is what I got:

Original:
Debug: 0.885 s
Release: 0.146 s

Branch:
Debug: 33.097 s
Release: 1.0899 s

So, needless to say, there is definitely lots of room for improvement.

@CelestialAmber
Copy link
Contributor Author

I decided to essentially revert the regex changes for this PR, and add a few additional checks to handle getting the actual symbol name and object name. The performance is only slightly worse than the current mainline branch:

Debug: 1.507 s
Release: 0.187 s

@CelestialAmber CelestialAmber changed the title PPCSymbolDB: Store symbol object names from loaded symbol maps separately Core: Store object name separately for symbols Oct 18, 2024
@CelestialAmber CelestialAmber force-pushed the mwld-map branch 2 times, most recently from 3e853bb to 568779d Compare October 18, 2024 17:24
@JMC47
Copy link
Contributor

JMC47 commented Oct 18, 2024

@dolphin-emu-bot rebuild

Copy link
Contributor

@sepalani sepalani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it with some symbol maps and it doesn't play well with them.

Could you please squash your commits into a single one and address the buildbot lint and build failures?

Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
std::string line = fmt::format("{0:08x} {1:08x} {2:08x} {3} {4}", symbol->address, symbol->size,
symbol->address, 0, symbol->name);
// Also write the object name if it exists
if (!symbol->object_name.empty())
line += fmt::format(" {0}", symbol->object_name);
line += "\n";
f.WriteString(line);
}

// Write .data section
f.WriteString("\n.data section layout\n");
for (const auto& symbol : data_symbols)
{
// Write symbol address, size, virtual address, alignment, name
f.WriteString(fmt::format("{0:08x} {1:08x} {2:08x} {3} {4}\n", symbol->address, symbol->size,
symbol->address, 0, symbol->name));
std::string line = fmt::format("{0:08x} {1:08x} {2:08x} {3} {4}", symbol->address, symbol->size,
symbol->address, 0, symbol->name);
// Also write the object name if it exists
if (!symbol->object_name.empty())
line += fmt::format(" {0}", symbol->object_name);
line += "\n";
f.WriteString(line);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some symbols have spaces when not mangled, so separating the symbol name and the object name with a space instead of a tab (like codewarrior symbol map) will most likely break Dolphin's parsing when loading these saved map.

Zelda TP:

00000000 000054 80006920  4 SECallback(int, int) 	d_home_button.o

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part isn't addressed. I highly recommend you to separate the symbol name and its object with a tab. Some symbols have multiple objects (e.g. AIInit<space><tab>ai.a<space>ai.o).

Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
@CelestialAmber CelestialAmber force-pushed the mwld-map branch 2 times, most recently from 9a94563 to ce68a03 Compare October 24, 2024 19:27
@mitaclaw
Copy link
Contributor

@dolphin-emu-bot rebuild

std::string line = fmt::format("{0:08x} {1:08x} {2:08x} {3} {4}", symbol->address, symbol->size,
symbol->address, 0, symbol->name);
// Also write the object name if it exists
if (!symbol->object_name.empty())
line += fmt::format(" {0}", symbol->object_name);
line += "\n";
f.WriteString(line);
}

// Write .data section
f.WriteString("\n.data section layout\n");
for (const auto& symbol : data_symbols)
{
// Write symbol address, size, virtual address, alignment, name
f.WriteString(fmt::format("{0:08x} {1:08x} {2:08x} {3} {4}\n", symbol->address, symbol->size,
symbol->address, 0, symbol->name));
std::string line = fmt::format("{0:08x} {1:08x} {2:08x} {3} {4}", symbol->address, symbol->size,
symbol->address, 0, symbol->name);
// Also write the object name if it exists
if (!symbol->object_name.empty())
line += fmt::format(" {0}", symbol->object_name);
line += "\n";
f.WriteString(line);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part isn't addressed. I highly recommend you to separate the symbol name and its object with a tab. Some symbols have multiple objects (e.g. AIInit<space><tab>ai.a<space>ai.o).

Source/Core/Core/Debugger/RSO.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
Source/Core/Core/Boot/Boot.cpp Outdated Show resolved Hide resolved
Source/Core/DolphinQt/Debugger/CodeWidget.cpp Outdated Show resolved Hide resolved
Source/Core/DolphinQt/Debugger/CodeWidget.cpp Outdated Show resolved Hide resolved
Source/Core/DolphinQt/Debugger/CodeWidget.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
@CelestialAmber
Copy link
Contributor Author

@dolphin-emu-bot rebuild

Comment on lines 407 to 408
std::vector<std::string> parts = SplitString(name, '\t');
size_t num_parts = parts.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't they be marked as const?

Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@sepalani sepalani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were those leading spaces change intentional? If so, please make such change appear in the commit message and/or in the GitHub conversation as they can be missed quite easily. If they weren't intentional, please make sure you double-check your change before pushing them.

Otherwise, you should also squash your commits into a single one.

Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
@CelestialAmber
Copy link
Contributor Author

The leading spaces are intentional.

Copy link
Contributor

@sepalani sepalani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leading spaces change will likely break tools designed to load Dolphin generated symbol maps, especially if they don't take into account leading spaces.

There exist symbol maps generated without these leading spaces. However, I can't confirm that these were generated with CW.

Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
Source/Core/Core/PowerPC/PPCSymbolDB.cpp Outdated Show resolved Hide resolved
@CelestialAmber
Copy link
Contributor Author

This leading spaces change will likely break tools designed to load Dolphin generated symbol maps, especially if they don't take into account leading spaces.

There exist symbol maps generated without these leading spaces. However, I can't confirm that these were generated with CW.

I understand there likely are some tools that might break from this, however there also are many tools that account for, or even assume the CodeWarrior symbol map format, which always includes leading spaces before symbols. For example, https://github.com/Cuyler36/Ghidra-GameCube-Loader/ does not support Dolphin symbol maps as is.

@sepalani
Copy link
Contributor

This leading spaces change will likely break tools designed to load Dolphin generated symbol maps, especially if they don't take into account leading spaces.
There exist symbol maps generated without these leading spaces. However, I can't confirm that these were generated with CW.

I understand there likely are some tools that might break from this, however there also are many tools that account for, or even assume the CodeWarrior symbol map format, which always includes leading spaces before symbols. For example, https://github.com/Cuyler36/Ghidra-GameCube-Loader/ does not support Dolphin symbol maps as is.

IIRC, Dolphin already provides scripts to save/load symbol maps from/to IDA/Ghidra in Tools.

Your PR is changing how symbol maps are saved, which wasn't the initial goal of your PR, nor was it in your initial code.

I'm fine with adding the object info to the Symbol struct. However, if your plan is to load/save CW symbol maps in a strict manner, then you should provide the appropriate functions instead of changing the existing ones. For instance, you can add in the GUI (the load/save dialog or a dedicated menu) another symbol map format support.

@CelestialAmber
Copy link
Contributor Author

Fine, I'll compromise. I would like to discuss things further though later.

@CelestialAmber
Copy link
Contributor Author

@sepalani Any updates?

Copy link
Contributor

@sepalani sepalani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM, tested with some games and it seems to work properly.

I think that it'd be better to have at least someone else (with a more involved symbol map/debugger use) to test this PR to make sure that it doesn't break on their end. Maybe @TryTwo or @dreamsyntax?

@dreamsyntax
Copy link
Member

I'll try with a few maps and see how it looks soon.

@dreamsyntax
Copy link
Member

dreamsyntax commented Nov 25, 2024

It would be nice to warn somehow that existing maps will be formatted, for those who do group collab/diffing - it could get messy if they don't know.

-800ab58c 00000030 800ab58c 0 HAS_Player::StartJumpCommand_AND_Player::TouchHangCommand(?Regular2PWorkingPulley?)_800ab58c_
+800ab58c 000030 800ab58c 0 HAS_Player::StartJumpCommand_AND_Player::TouchHangCommand(?Regular2PWorkingPulley?)_800ab58c_^M

[x] Old map format loads fine, saves in new format
[x] .data section preserved as expected for non-functions
[x] Total lines matching 1:1 - tested with GUP*.map and RSOs from RSVE8P
[x] Test a game with full built-in symbols (Spyro A Hero's Tail)

Edit: If you saw my message about an issue with Spyro, disregard. I made a mistake in my diffing.

@dreamsyntax
Copy link
Member

Approved with the caveat that I cannot explicitly test the .a/.o detection/storage part of the PR as none of my games (that I know of) have this condition.

My other user made maps and built in maps work as expected.

@TryTwo
Copy link
Contributor

TryTwo commented Nov 26, 2024

I don't have anything to test. All my personal symbols were switched over to an extra note-layer in my personal debugger builds, because I like to mark certain instructions on top of symbol marking the functions.

@OatmealDome OatmealDome merged commit 57b1234 into dolphin-emu:master Dec 7, 2024
13 checks passed
dreamsyntax added a commit to ShadowTheHedgehogHacking/shadow-symbols-and-dmw that referenced this pull request Dec 8, 2024
@CelestialAmber CelestialAmber deleted the mwld-map branch December 8, 2024 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants